News

Google Provides Guidance on 404 and 410 Status Codes

From time to time, questions arise about how Google handles 404 and 410 error codes, prompting a good understanding of their differences. Google’s John Mueller recently addressed an intriguing question regarding web pages that no longer exist and how web publishers should approach this issue.

How Google Handles 404/410 Status Codes

During a recent Webmaster Hangout, Google’s John Mueller was asked:

“If a 404 error goes to a page that doesn’t exist, should I make them a 410?”

John Mueller responded:

“From our point of view, in the mid term/long term, a 404 is the same as a 410 for us. So in both of these cases, we drop those URLs from our index.

We generally reduce crawling of those URLs so that we don’t spend too much time crawling things that we know don’t exist.

The subtle difference here is that a 410 will sometimes fall out a little bit faster than a 404. But usually, we’re talking on the order of a couple days or so.

So if you’re just removing content naturally, then it’s perfectly fine to use either one. If you’ve already removed this content long ago, then it’s already not indexed so it doesn’t matter for us if you use a 404 or 410.”

This answer is particularly useful, emphasizing that using a 410 status code can expedite Google’s removal of a web page from its index. Speeding up page removal can be especially beneficial following a hacking incident where a hacker posted numerous spam pages. These are the kind of pages a web publisher would prefer not to be associated with their site. John Mueller confirmed that a 410 status code can speed up removing those unwanted pages from Google’s index.

Official 410 Status Code Specifications

Here are the official specifications for the 410 status code:

“…Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval… The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed.

It is not necessary to mark all permanently unavailable resources as “gone” or to keep the mark for any length of time — that is left to the discretion of the server owner.”

The advisory about "should" delete is significant. It implies that clients are not required to eliminate the references but are encouraged to do so. Historically, Google has been effective at removing 410 status code pages, adhering to the official specifications.

Previous Google Guidance on 404 and 410

John Mueller’s response aligns with the guidance provided by Matt Cutts during his time at Google. Matt Cutts explained that upon receiving a 404 status code, Google will wait 24 hours before initiating the removal process of a page from its index.

This delay is a safeguard against errors or accidental events at the website. For instance, web servers might be down, or a site migration could take longer than anticipated. While John Mueller did not comment if this still applies, Matt Cutts previously stated:

“Webmasters often make mistakes. Pages can disappear. People can misconfigure sites. Sites may go down. People can accidentally block GoogleBot.

So, considering the vast web, the crawl team must ensure robustness. With 404s, and perhaps 401s and 403s, if a 404 is encountered, we preserve that page for 24 hours in the crawling system.

We pause to verify whether it was a temporary 404. It may not have been intended to signify a missing page.

In the crawling system, it remains protected for 24 hours.

If a 410 is seen, the crawling system assumes the webmaster’s intention and promptly registers it as an error without a 24-hour protection.

We’ll still revisit to confirm the pages are truly gone or to check if they have returned.

It’s crucial not to assume this behavior will remain unchanged.

If a page is truly gone, serving a 404 is acceptable. If confirmed as permanently gone, a 410 is suitable.

Although our crawling system is designed for resilience, site downtimes, hacking events, and other issues are handled efficiently to recover available content."

Images by Shutterstock, Modified by Author

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button